Add missing memory estimates#2255
Conversation
bbc7e02 to
7c200d4
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughMemory estimation in ChangesCAGRA and IVF-PQ memory estimation updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.23.0)cpp/src/neighbors/detail/cagra/cagra_helpers.cpp┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.21][ERROR]: unable to find a config; path Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/neighbors/detail/cagra/cagra_helpers.cpp`:
- Around line 184-208: The extend_gpu_mem variable is being calculated when
add_data_on_build is true but is not being included in the total_dev
calculation. Fix this by adding extend_gpu_mem to the std::max initializer list
that computes total_dev alongside kmeans_gpu_mem, search_phase_dev, and
gpu_workspace_size to ensure the extend-phase GPU memory requirement is properly
accounted for in the final device memory estimate.
- Around line 159-239: The `ivf_pq_build_mem_usage` function has a broken
control flow where the return statement is inside the debug logging condition
block, causing a compile error when debug logging is disabled. Additionally, the
code references an undefined variable `pq_params` when the actual parameter is
`params`. To fix this, move the return statement that returns
`std::make_pair(total_host, total_dev)` to the end of the function after the
closing brace of the debug logging if-block, change
`pq_params.build_params.add_data_on_build` to
`params.build_params.add_data_on_build`, and ensure all code after the if-block
has correct indentation so the function properly exits regardless of whether
debug logging is enabled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b7aa4e21-abed-421b-b84f-0824ce932a43
📒 Files selected for processing (1)
cpp/src/neighbors/detail/cagra/cagra_helpers.cpp
7c200d4 to
3e27c6e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
cpp/src/neighbors/detail/cagra/cagra_helpers.cpp (1)
185-185: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick winCRITICAL:
pq_paramsis undefined in this scope; useparams.The function parameter is
params(Line 143);pq_paramsdoes not exist here, so this won't compile.🐛 Proposed fix
- if (pq_params.build_params.add_data_on_build) { + if (params.build_params.add_data_on_build) { extend_gpu_mem = ivf_pq_extend_mem_usage(dataset, params, dtype_size); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/neighbors/detail/cagra/cagra_helpers.cpp` at line 185, Replace the undefined variable `pq_params` with the correct parameter name `params` in the condition at line 185 where `pq_params.build_params.add_data_on_build` is being accessed. The function parameter is named `params` (defined on line 143), not `pq_params`, so update the reference to use the correct variable name to ensure the code compiles.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/neighbors/detail/cagra/cagra_helpers.cpp`:
- Around line 87-99: The if block in the optimize_workspace_size function that
checks raft::default_logger().should_log(rapids_logger::level_enum::debug) is
opened with a brace but never closed, causing the total_host, total_host_fixed,
total_dev, total_dev_fixed variable declarations and the return statement to be
incorrectly nested inside the if block. Add a closing brace immediately after
the debug_host_size calculation (after the hist comment) and before the size_t
total_host declaration to properly close the if block and allow the function to
execute correctly in both debug and non-debug code paths.
- Around line 101-136: The ivf_pq_extend_mem_usage function lacks validation for
the pq_dim parameter, which can default to 0 when using the default constructor
and will cause division by zero when used in raft::round_up_safe and in the
code_bytes_per_vec calculation. Add a RAFT_EXPECTS check at the beginning of the
function to ensure params.build_params.pq_dim is non-zero before it is used in
the rot_dim assignment and the code_bytes_per_vec calculation.
---
Duplicate comments:
In `@cpp/src/neighbors/detail/cagra/cagra_helpers.cpp`:
- Line 185: Replace the undefined variable `pq_params` with the correct
parameter name `params` in the condition at line 185 where
`pq_params.build_params.add_data_on_build` is being accessed. The function
parameter is named `params` (defined on line 143), not `pq_params`, so update
the reference to use the correct variable name to ensure the code compiles.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d30ab270-2969-474d-8d34-226155d494b2
📒 Files selected for processing (1)
cpp/src/neighbors/detail/cagra/cagra_helpers.cpp
achirkin
left a comment
There was a problem hiding this comment.
Hi @huuanhhuyn , I have some concerns about estimating the workspace memory, (see below). Otherwise, looks good.
| size_t rot_dim = raft::round_up_safe<size_t>(dim, pq_dim); | ||
| size_t n_clusters = params.build_params.n_lists; | ||
|
|
||
| RAFT_EXPECTS(pq_dim > 0, "pq_dim should not be 0"); |
There was a problem hiding this comment.
nitpick: I don't think this check is necessary - it doesn't guard against anything dangerous below (no divisions) and its valid values are checked in IVF-PQ during build anyway.
There was a problem hiding this comment.
The AI review suggested adding this check.
| // Placeholder freed before resize_list | ||
| size_t device_size = std::max(placeholder_dev, resize_lists_dev); | ||
|
|
||
| return device_size + workspace_size; |
There was a problem hiding this comment.
I think we need to be careful to not account for the same memory multiple times. As far as I remember, we add the whole workspace size as used somewhere above in the call chain. The intuition behind that is that the algorithms are always free to use all of the workspace size (e.g. by changing the batch size). Please make sure your changes are consistent with the rest of the logic in that regard.
There was a problem hiding this comment.
I don't think we have any duplicated estimate for the placeholder and resize lists within our cagra_build_mem_usage() call chain.
The estimate would be the upper-bound for the real allocation because the max_batch_size could be shrinked further on the real path and the alignment "waste" will be less. The memory resource tracking shows that the over estimation is not too significant (e.g. 280MiB vs 230MiB).
There was a problem hiding this comment.
Do you think I should add the shrinking logics based on the total free mem? This would improve the over-estimation and it will still be the upper bound afterwards.
With the current version, if the smallest free mem is 1 GiB and the largest dim is 4000 then the max_batch_size could be over-estimated by approx. 4x (the real batch size is shrinked to 16k).
Add comment to explain extra debug allocation Co-authored-by: Artem M. Chirkin <9253178+achirkin@users.noreply.github.com>
c0d09e0 to
677f67e
Compare
|
@achirkin Thanks for your review. Could you please look at it again? |
Add missing memory usage estimates: